update proposal for maxUnavailable for statefulsets #1010

krmayankk · 2019-04-27T06:39:15Z

/sig apps

krmayankk · 2019-04-27T06:41:30Z

/assign @bgrant0607 @kow3ns @janetkuo @Kargakis

justaugustus · 2019-04-28T03:37:10Z

/assign @bgrant0607 @kow3ns @janetkuo @Kargakis

krmayankk · 2019-04-29T21:31:42Z

FYI @furykerry @resouer @FillZpp

FillZpp · 2019-04-30T05:41:25Z

Great, the description of choices is very clear. Also, i still recommend choice 3, it keeps balance between faster updating and ordering guarantee.

The problem of choice 1 is that it looks like sample, but actually does not. Let me take an example:

First the statefulset with replicas 4 has pods [0-3], then update template and set maxUnavailable to 2.
StatefulSet controller starts to update pod 2&3. With choise 1, controller will update remaining pods only if pod 2&3 are ready. But if pod 3 becomes ready first, currently unavailable pod is 2, how does controller know it should wait for 2 ready before updating remaining pods?
You may say, controller calculates the pod updating groups with maxUavailable number, like [{0, 1}, {2, 3}]. But the question is, if replicas been changed to 5 during updating pods 2&3, then controller will split pods into [{0}, {1, 2}, {3, 4}].
Then, you will find pod 2 and 3, which are in different groups, are updating together. Surprise!!

krmayankk · 2019-05-01T05:36:00Z

You may say, controller calculates the pod updating groups with maxUavailable number, like [{0, 1}, {2, 3}]. But the question is, if replicas been changed to 5 during updating pods 2&3, then controller will split pods into [{0}, {1, 2}, {3, 4}].

Is this even possible ? The replicas cannot be scaled to 5 before the update finishes i believe. The update knows to start from highest ordinal and count maxUnavailablePods at a time.

Note the goal is faster rolling updates. In faster rolling updates, we could potentially provide two modes, one with ordering and one which says ordering in the maxUnavailable group is not guaranteed. I dont think we have enough data to say we need the modes. We can start with a option which makes most sense for faster updates. The consumer of faster updates would know the ordering is not guaranteed when using this feature and that is a conscious choice the user will make.

FillZpp · 2019-05-03T00:47:09Z

Is this even possible ? The replicas cannot be scaled to 5 before the update finishes i believe.

Actually, it can. You may test this case in your Kubernetes cluster or minikube e.g.

So as i said before, controller with choice 1 will be confused when user modify replicas during updating.

RobertKrawitz · 2019-05-09T14:09:18Z

keps/sig-apps/20190226-maxunavailable-for-statefulsets.md

+     with ordinal 2 will start Terminating. Pods with ordinal 0 and 1 will remain untouched due the partition. In this choice, the number of pods
+     terminating is not always maxUnavailable, but sometimes less than that. For e.g. if pod with ordinal 3 is running and ready but 4 is not, we
+     still wait for 4 to be running and ready before moving on to 2. This implementation avoids out of order Terminations of pods.
+  2: Pods with ordinal 4 and 3 will start Terminating at the same time(because of maxUnavailable). When any of 4 or 3 are running and ready, pods


The three alternatives are getting merged into one paragraph; needs an extra newline between each alternative.

add a newline

janetkuo · 2019-06-12T17:53:29Z

keps/sig-apps/20190226-maxunavailable-for-statefulsets.md

 #### Implementation

+TBD: Will be updated after we have agreed on the semantics beign discussed above.


janetkuo · 2019-06-12T17:54:22Z

keps/sig-apps/20190226-maxunavailable-for-statefulsets.md

+
+I recommend Choice 1, simply because its easy to think and reason about.
+
+Other choices are introduce a flag, which controls whether we want maxUnavailable with ordering or without ordering. I dont think we need that


List this as Choice 4?

krmayankk · 2019-06-18T02:08:33Z

Is this even possible ? The replicas cannot be scaled to 5 before the update finishes i believe.

Actually, it can. You may test this case in your Kubernetes cluster or minikube e.g.

So as i said before, controller with choice 1 will be confused when user modify replicas during updating.

We can specify that in documentation, what the behavior will be. Note that changing the grouping is not a big deal. The main goal is faster updates, whether the pod gets updated in one grouping or another doesnt matter, unless you have a good reason it matters.
Happy to hear others opinion.

krmayankk · 2019-06-18T05:41:57Z

Adding more people to see if they have opinions on the choices presented here
@smarterclayton @thockin @brendandburns @kubernetes/sig-apps-feature-requests

smarterclayton · 2019-06-18T20:37:02Z

keps/sig-apps/20190226-maxunavailable-for-statefulsets.md

-about. They are used by DaemonSet and StatefulSet for tracking revisions. It would be so much nicer if all the use cases of Deployments can be met and we 
-could track the revisions by ControllerRevisions.
+4. Sometimes I just want easier tracking of revisions of a rolling update. Deployment does it through ReplicaSets and has its own nuances. Understanding 
+that requires diving into the complicacy of hashing and how ReplicaSets are named. Over and above that, there are some issues with hash collisions which 


I feel like this point should be made clear in feedback to @kubernetes/sig-apps-api-reviews - are there any supporting statements we should have here about changes that should happen in deployments?

smarterclayton · 2019-06-18T20:37:45Z

keps/sig-apps/20190226-maxunavailable-for-statefulsets.md

+that requires diving into the complicacy of hashing and how ReplicaSets are named. Over and above that, there are some issues with hash collisions which 
+further complicate the situation(I know they were solved). StatefulSet introduced ControllerRevisions in 1.7 which are much easier to think and reason 
+about. They are used by DaemonSet and StatefulSet for tracking revisions. It would be so much nicer if all the use cases of Deployments can be met in 
+StatefulSet's and additionally we could track the revisions by ControllerRevisions.


Hrm - that sounds like a dangerous statement. Are you saying stateless deployments should be implemented by StatefulSets?

@smarterclayton I will reword this. What i am trying to say is that sometimes , people just want this little extra thing which statefulsets provide which is stable identity, without the storage part , but everything else is exactly similar to deployments. They stick with deployments, because statefulset doesnt have maxUnavailable . If statefulset had maxUnavailable they could move to it and additionally take advantage of better tracking of versions.

In general I do feel that the distinction between statefulset and deployment is blurring. The only advantage at this point i can think of using deployments vs statefulset is that deployments can maxSurge but statefulset's cannot because of their identity. But that is a different discussion of making them as one object.

smarterclayton · 2019-06-18T20:42:47Z

keps/sig-apps/20190226-maxunavailable-for-statefulsets.md


 ### Goals
 StatefulSet RollingUpdate strategy will contain an additional parameter called `maxUnavailable` to control how many Pods will be brought down at a time,
 during Rolling Update.

 ### Non-Goals
-maxUnavailable is only implemeted to affect the Rolling Update of StatefulSet. Considering maxUnavailable for Pod Management Policy of Parallel is beyond 
+maxUnavailable is only implemented to affect the Rolling Update of StatefulSet. Considering maxUnavailable for Pod Management Policy of Parallel is beyond 


@smarterclayton because podManagementPolicy is only applicable during StatefulSet creation, deletion and scaling. The main use case is faster rolling updates. We could make it applicable to create/delete/scale, but that is secondary to the goal. I thought we could just limit the scope of the initial implementation to make it less confusing, but if you think we should apply this to both, that is fine as well.

So during upgrade, if 3 of 5 nodes are evacuated, PMP is ignored? Since this is mostly established behavior of replicasets and deployments, you should mention why it's out of scope. I am more concerned with creating something that differs between RS/Deployment and StatefulSet right now that I am about implementation ordering, so I just want the KEP to reflect that it is a) consistent with user expectations and b) justifies statements about timing with reasons

fejta-bot · 2019-06-18T21:23:04Z

This PR may require API review.

If so, when the changes are ready, complete the pre-review checklist and request an API review.

Status of requested reviews is tracked in the API Review project.

krmayankk · 2019-06-19T00:57:41Z

keps/sig-apps/20190226-maxunavailable-for-statefulsets.md

-     this time, only 1 is unavailable although we requested 2.
+  are the possible behavior choices we have:-
+
+  1. Pods with ordinal 4 and 3 will start Terminating at the same time(because of maxUnavailable). Once they are both running and ready, pods 


@smarterclayton do you have any suggestions/opinions on which of these semantics makes more sense when implementing maxUnavailable ?

krmayankk · 2019-07-18T06:06:33Z

@smarterclayton @kow3ns based on discussion in sig-apps meeting and offline discussion with Clayton, I have updated the proposal. Please take and look and provide feedback.

kow3ns · 2019-07-22T15:25:21Z

keps/sig-apps/20190226-maxunavailable-for-statefulsets.md

+
+##### Recommended Choice
+
+I recommend Choice 4, using PMP=Parallel for the first Alpha Phase. This would give the users fast 


In particular, I think users will want the behvior of (1) above if they are using Ordered PodManagement and the behavior of (2) if they are using parallel. Consider implementing just these semantics. That is, if PMP is ordered the user is declaring that they care about termination ordering, even if they decide they are willing to tolerate a larger number of distruptions during an update. If the PMP is Parallel, the user does not care about about the termination ordering during turn-up/turn-down. If they want to do care about the termination ordering during update they can set the maxUnvailable field to 1 to preserve the current behavior. If they wish to tolerate a larger number of disruptions they can increase its value.

krmayankk · 2019-08-10T18:59:12Z

@kow3ns can you approve the changes to mark it implementable so that I can get a Alpha in 1.16 ?

praseodym · 2019-08-12T21:14:22Z

For reference: the Kruise third-party project provides an advanced StatefulSet implementation with maxUnavailable and more.

FillZpp · 2019-08-19T11:08:32Z

I'm glad to see OpenKruise been noted in this KEP. @praseodym

Kruise is a set of controllers which extends and complements Kubernetes core controllers on workload management, and the AdvancedStatefulSet provides features like maxUnavailable, inPlaceUpdate and so on.

kow3ns · 2019-08-19T17:16:10Z

/approve

krmayankk · 2019-08-19T17:46:27Z

@janetkuo @kow3ns can one of you also lgtm this ?

k8s-ci-robot · 2019-08-30T05:07:47Z

@krmayankk: you cannot LGTM your own PR.

In response to this:

/lgtm

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

kow3ns · 2019-09-12T17:13:05Z

/lgtm

k8s-ci-robot · 2019-09-12T17:13:20Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: kow3ns, krmayankk

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~keps/sig-apps/OWNERS~~ [kow3ns]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

kow3ns · 2019-09-12T17:14:22Z

I'm generally not if favor of both approving and lgtming something. However, we've discussed this in the SIG multiple times over a period of more than a month.

krmayankk · 2019-09-12T17:35:12Z

thanks @kow3ns If there are any follow up comments , suggestions from other reviewers, i am happy to make follow up PR's while i make progress on the implementation

kerthcet · 2022-09-26T07:55:21Z

Kindly ping @krmayankk Are we going to graduate this feature to beta in v1.26, or do we have any plans?

k8s-ci-robot requested review from kow3ns and prydonius April 27, 2019 06:39

k8s-ci-robot assigned bgrant0607, janetkuo, 0xmichalis and kow3ns Apr 28, 2019

krmayankk changed the title ~~update maxUnavailable for statefulsets~~ update proposal for maxUnavailable for statefulsets Apr 29, 2019

krmayankk mentioned this pull request Apr 29, 2019

maxUnavailable for StatefulSets #961

Open

8 tasks

RobertKrawitz reviewed May 9, 2019

View reviewed changes

krmayankk mentioned this pull request May 30, 2019

Bring MaxUnavailable Rolling Update to StatefulSet kubernetes/kubernetes#68397

Closed

janetkuo reviewed Jun 12, 2019

View reviewed changes

k8s-ci-robot added the kind/feature Categorizes issue or PR as related to a new feature. label Jun 18, 2019

smarterclayton reviewed Jun 18, 2019

View reviewed changes

k8s-ci-robot added the kind/api-change Categorizes issue or PR as related to adding, removing, or otherwise changing an API label Jun 18, 2019

smarterclayton reviewed Jun 18, 2019

View reviewed changes

krmayankk commented Jun 19, 2019

View reviewed changes

k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Jul 18, 2019

kow3ns approved these changes Jul 29, 2019

View reviewed changes

sebgl mentioned this pull request Aug 1, 2019

Use OnDelete instead of RollingUpdate to roll more than 1 pod at a time elastic/cloud-on-k8s#1442

Closed

update maxUnavailable for statefulsets

63530d0

k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Aug 19, 2019

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Sep 12, 2019

k8s-ci-robot merged commit 8e70bb6 into kubernetes:master Sep 12, 2019

k8s-ci-robot added this to the v1.17 milestone Sep 12, 2019

tengqm mentioned this pull request Mar 29, 2022

Document the MaxUnavailableStatefulSet feature kubernetes/website#32596

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

update proposal for maxUnavailable for statefulsets #1010

update proposal for maxUnavailable for statefulsets #1010

krmayankk commented Apr 27, 2019

krmayankk commented Apr 27, 2019

justaugustus commented Apr 28, 2019

krmayankk commented Apr 29, 2019

FillZpp commented Apr 30, 2019 •

edited

krmayankk commented May 1, 2019

FillZpp commented May 3, 2019

RobertKrawitz May 9, 2019

krmayankk Jun 18, 2019

janetkuo Jun 12, 2019

janetkuo Jun 12, 2019

krmayankk commented Jun 18, 2019

krmayankk commented Jun 18, 2019

smarterclayton Jun 18, 2019

smarterclayton Jun 18, 2019

krmayankk Jun 19, 2019

smarterclayton Jun 18, 2019

krmayankk Jun 19, 2019

smarterclayton Jul 8, 2019

fejta-bot commented Jun 18, 2019

krmayankk Jun 19, 2019

krmayankk commented Jul 18, 2019

kow3ns Jul 22, 2019

krmayankk commented Aug 10, 2019

praseodym commented Aug 12, 2019

FillZpp commented Aug 19, 2019

kow3ns commented Aug 19, 2019

krmayankk commented Aug 19, 2019

k8s-ci-robot commented Aug 30, 2019

kow3ns commented Sep 12, 2019

k8s-ci-robot commented Sep 12, 2019

kow3ns commented Sep 12, 2019

krmayankk commented Sep 12, 2019

kerthcet commented Sep 26, 2022

		#### Implementation

		TBD: Will be updated after we have agreed on the semantics beign discussed above.


		I recommend Choice 1, simply because its easy to think and reason about.

		Other choices are introduce a flag, which controls whether we want maxUnavailable with ordering or without ordering. I dont think we need that


		##### Recommended Choice

		I recommend Choice 4, using PMP=Parallel for the first Alpha Phase. This would give the users fast

update proposal for maxUnavailable for statefulsets #1010

update proposal for maxUnavailable for statefulsets #1010

Conversation

krmayankk commented Apr 27, 2019

krmayankk commented Apr 27, 2019

justaugustus commented Apr 28, 2019

krmayankk commented Apr 29, 2019

FillZpp commented Apr 30, 2019 • edited

krmayankk commented May 1, 2019

FillZpp commented May 3, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

krmayankk commented Jun 18, 2019

krmayankk commented Jun 18, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

fejta-bot commented Jun 18, 2019

Choose a reason for hiding this comment

krmayankk commented Jul 18, 2019

Choose a reason for hiding this comment

krmayankk commented Aug 10, 2019

praseodym commented Aug 12, 2019

FillZpp commented Aug 19, 2019

kow3ns commented Aug 19, 2019

krmayankk commented Aug 19, 2019

k8s-ci-robot commented Aug 30, 2019

kow3ns commented Sep 12, 2019

k8s-ci-robot commented Sep 12, 2019

kow3ns commented Sep 12, 2019

krmayankk commented Sep 12, 2019

kerthcet commented Sep 26, 2022

FillZpp commented Apr 30, 2019 •

edited